Reconstructing the free-energy landscape of 
Met-enkephalin using dihedral Principal Component 
Analysis and Well-tempered Metadynamics 
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Abstract 

Well-Tempered Metadynamics (WTmetaD) is an 
efficient method to enhance the reconstruction of 
the free-energy surface of proteins. WTmetaD 
guarantees a faster convergence in the long time 
limit in comparison with the standard metady- 
namics. It still suffers however from the same 
limitation, i.e. the non trivial choice of perti- 
nent collective variables (CVs). To circumvent 
this problem, we couple WTmetaD with a set of 
CVs generated from a dihedral Principal Compo- 
nent Analysis (dPCA) on the Ramachadran dihe- 
dral angles describing the backbone structure of 
the protein. The dPCA provides a generic method 
to extract relevant CVs built from internal coordi- 
nates. We illustrate the robustness of this method 
in the case of the small and very diffusive Met- 
enkephalin pentapeptide, and highlight a criterion 
to limit the number of CVs necessary to biased the 
metadynamics simulation. The free-energy land- 
scape (FEL) of Met-enkephalin built on CVs gen- 
erated from dPCA is found rugged compared with 
the FEL built on CVs extracted from PCA of the 
Cartesian coordinates of the atoms. 

1 Introduction 

Since the late 1980s emerges the idea that a global 
overview of the protein's energy surface is of 
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paramount importance for a quantitative under- 
standing of the relationships between structure, 
dynamics, stability, and functional behavior of 
proteins.- - - Thanks to the continuous increase of 
the computing power— ^ and of the reliability of 
empirical force fields,— 1 ^ all-atom molecular dy- 
namics (MD) simulations become a widely em- 
ployed computational technique to simulate the 
dynamics of complex systems such as proteins 
through discrete integration of the Newton's equa- 
tions of motions of each atom. Under the assomp- 
tion of ergodicity one can then derive the equi- 
librium properties of interest of a protein as time 
averages of the corresponding observables along 
the MD trajectories and compute the associated 
free-energy. However in several cases all-atom 
MD simulations are still not competitive to de- 
scribe the protein conformational dynamics, due 
to the fact that using an atomistic model is com- 
putationally expensive, as sufficiently realistic po- 
tential energy functions are intrinsically complex. 
Moreover, most phenomena of interest take place 
on time scales that are orders of magnitude larger 
than the accessible time that can be currently sim- 
ulated with classical all-atom molecular dynam- 
ics.— This issue can be addressed either by re- 
ducing the dimension of the conformational space 
explored by the protein during the MD simula- 
tions by using a coarse-grained representation of 
the protein structure, where each residue has only 
a few degrees of freedom and where the solvent 
surrounding the protein is implicit,— 1 ^ or by ac- 
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celerating the exploration of the conformational 
space in (all-atom) MD simulations. In the lat- 
ter case, a large variety of methods referred to 
as enhanced sampling techniques have been pro- 
posed.—"— They exploit a methodology aimed at 
accelerating rare events and based on constrained 
MD. Metadynamics (metaD )21~— belongs to this 
class of methods: it enhances the sampling of the 
conformational space of a system along a few se- 
lected degrees of freedom, named collective vari- 
ables (CVs) and reconstructs the probability dis- 
tribution as a function of these CVs. In a nut- 
shell, the dynamics in the space of the chosen CVs 
is enhanced by disfavoring already visited regions 
through the use of a history-dependent potential, 
constructed as a sum of Gaussians centered along 
the trajectory followed by the CVs. This sum 
of Gaussians is then exploited to reconstruct it- 
eratively an estimator of the free-energy surface 
spanned by the CVs in the region explored dur- 
ing the biased dynamics. Well-Tempered Meta- 
dynamics 3 ^ (WTmetaD) is the most recent variant 
of the method, which, because of its convergence 
properties, is the most widely adopted version of 
the metadynamics algorithm. In WTmetaD, the 
bias deposition rate decreases over simulation time 
and the dynamics of all the microscopic variables 
becomes progressively closer to thermodynamic 
equilibrium as the simulation proceeds, making 
the bias to converge to its limiting value in a sin- 
gle run and avoiding the problem of overfilling, 
i.e. when the height of the accumulated Gaussians 
largely exceeds the true barrier height. 

However, WTmetaD suffers from the same lim- 
itation than standard metaD, i.e. its succes de- 
pends on the critical choice of a reasonable num- 
ber of relevant CVs. All the relevant slow vary- 
ing degrees of freedom must be catched by the 
CVs. In addition, the number of CVs must be 
small enough to avoid exceedingly long computa- 
tional time, while being able to distinguish among 
the different conformational states of the system. 
Consequently, identifying a set of CVs appropri- 
ate for describing complex processes involves a 
right understanding of the physics and chemistry 
of the process under study.— Choosing a correct 
set of CVs thus remains a challenge, as a whole, 
independently of the enhanced sampling technique 
one could consider. A way to overcome the diffi- 



culties to extract the functionally relevant motions 
from MD simulations is to use collective coordi- 
nates identifying a low dimensional subspace in 
which the significant, functional protein motions 
are expected to take place. Principal Component 
Analysis (PCA) of the structure fluctuations of a 
protein, also called essential dynamics, is a pow- 
erfull method to represent in a space of only a few 
collective coordinates (typically between 1 and 3) 
both the functional modes of proteins in their na- 
tive stated"— and the folding pathways of certain 
proteins.- 3 -*^ Therefore PCA emerges in the last 
few years as a relevant candidate to reconstruct 
the protein free-energy landscape both with stan- 
dard metaD and WTmetaD.—"^ 3 - These methods 
show the efficience of this particular class of CVs 
in terms of convergence time and free-energy re- 
construction accuracy. However it is well known 
that the smooth appearance of the FES obtained 
with PCA of the fluctuations of the Cartesian co- 
ordinates of the atoms often represents an artifact 
of the mixing of the internal and overall motions of 
the protein.— Indeed, whereas the overall transla- 
tion of a molecule can readily be separated from its 
internal deformations by requiring that the molec- 
ular center of mass is fixed, the elimination of 
overall rotation of the molecule from its internal 
motions is only possible for relatively rigid molec- 
ular structure, i.e. for which the structural fluctu- 
ations can be described as small departures from 
a reference molecular structure. In this case, the 
separation can be achieved by superimposing the 
structures at the snapshots of a MD trajectory to a 
reference structure, such that the root-mean- square 
deviations of the fluctuations of the atom posi- 
tions is minimized.— While this procedure cor- 
rectly removes the overall rotation in the limit of 
small fluctuations, it breaks down in the case of 
large amplitude motions, where it is not possible 
to unambiguously define a single reference struc- 
ture. It is then interesting to consider the use of 
linear correlation analysis for internal coordinates, 
for which the overall motion is safely eliminated 
without the use of any arbitrary reference struc- 
ture. The dihedral angles are appropriate internal 
coordinates for biopolymers. However, attention 
has to be paid to the periodicity of such quantities 
as the arithmetic mean of dihedral angles can not 
be easily calculated as in Cartesian coordinates. 
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One can circumvent this problem by considering 
the method referred to as dihedral Principal Com- 
ponent analysis (dPCA) based on (Ramachadran) 
dihedral angles iid 6 . in dPCA, one must perform a 
transformation from the space of dihedral angles to 
a metric coordinate space built up by the trigono- 
metric functions of the dihedral angles. The result- 
ing representation of the dihedral angles is unique 
and therefore allows us to readily calculate the 
mean and the covariance matrix. 

A constraint of using PCA (or dPCA) to select 
the relevant CVs representing the functional mo- 
tions of a protein, is that some prior knowledge of 
the dynamics of the system under study is neces- 
sary. The structural fluctuations of the molecule 
must be determined from a preliminary short un- 
biased MD simulation to which dPCA is applied. 

In the present paper we perform biased (250 
ns) and unbiased (2.6 lis) simulations of a penta- 
peptide, Met-enkephalin (Tyr-Gly-Gly-Phe-Met), 
which is one of the smallest neurotransmitter pep- 
tides. 47 Because of its important roles in physio- 
logical processes as for examples the mediation of 
pain and respiratory depression, this neuropeptide 
has been the subject of numerous studies both ex- 
perimental—"— and numerical.— 1 ^"— This small 
flexible peptide does not adopt a single confor- 
mation in aqueous solution, that is consistent 
with the requirement that Met-enkephalin be suf- 
ficiently flexible to bind to different opioid recep- 
tors,— and its free-energy surface is complex and 
well-structured. However many aspects of both 
the structure and dynamics of Met-enkephalin in 
aqueous solution remain unsolved. 

We show that coupling WTmetaD with the 
dPCA of the Ramachadran dihedral angles pro- 
vides a rugged FEL of Met-enkephalin, which 
is very different from the result one can ex- 
tract from the coupling of WTmetaD with the 
PCA of the Cartesian coordinates of the atoms. 
We demonstrate that combining WTmetaD with 
dPCA speeds up the reconstruction of a free- 
energy surface of the peptide compared with the 
unbiased MD simulations. Finally, we under- 
line a criterion to limit the number of CVs nec- 
essary to biased the metadynamics simulation in 
terms of the existence of the minima of the one- 
dimensional free-energy profiles associated to Ra- 
machandran dihedral angles along the amino-acid 



sequence. 

2 Computational Method 

MD simulation. All-atom MD simulations in ex- 
plicit water of Met-enkephalin (PDB ID: 1PLW) 
were carried out with the version 4.5.5 of the 
GROMACS software package^ using TIP3P wa- 
ter model- 6 ^ and the Amber99SB force field.- 6 -!- Bi- 
ased simulations were performed using version 1 .3 
of the plugin for free-energy calculation, named 
PLUMED. 62 The Met-enkephalin was solvated in 
892 TIP3P water molecules enclosed in a cubic 
box of 27.7 nm 3 under periodic boundary condi- 
tions. The time step used in all simulations was 
0.001 ps, and the list of neighbors was updated ev- 
ery 0.005 ps with the "grid" method and a cutoff 
radius of 1 .4 nm. The coordinates of all the atoms 
in the simulation box were saved every 1 ps. The 
initial velocities were chosen randomly. We used 
the NPT ensemble with the temperature and pres- 
sure kept to the desired value by using the Bered- 
sen method and an isotropic coupling for the pres- 
sure (T = 300K, %t = 1 fs; Po = 1 bar, coupling 
time Xp = 1 ps). The electrostatic interactions were 
computed by using the particle mesh Ewald (PME) 
algorithm—"— (with a radius of 1 nm) with the 
fast Fourier transform optimization. The cutoff al- 
gorithm was applied for the noncoulomb poten- 
tials with a radius of 1 nm. The energy of the 
model was first optimized with the "steepest de- 
scent minimization" algorithm and then by using 
the "conjugate gradients" algorithm. The system 
was warmed up for 40 ps and equilibrated for 200 
ps with lower restraints, finishing with equilibra- 
tion without restraints at 300 K for 1 ns. The sim- 
ulation was then continued for 2.6 fls. 

Determination of metadynamics CVs. The 
set of CVs used in WTmetaD simulations were 
automatically generated from a dPCA of the 8 
backbone Ramachadran dihedral angles 66 of the 
unbiased MD trajectory. In short, dPCA con- 
sists in diagonalizing the covariance matrix of the 
Cartesian components of the two-dimensional vec- 
tors q(t) representing the set of dihedral angles 
computed from a preliminary unbiased MD run, 
say q(t) = {q\ n = cosfa, q 4n -l = sin0 n , q 4n - 2 = 
cos y/ n , q4 n -3 = sin \j/ n } with 1 < n < 4 in the case 
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of Met-enkephalin. The eigenvalues of the co- 
variance matrix are ordered by decreasing value 
and each collective mode k is characterized by A# 
and by the corresponding eigenvector eW. The 
eigenvectors corresponding to the largest eigen- 
values contain the largest fluctuations and hence 
hold the most important variability of the system. 
The contribution of the backbone dihedral angles 
0„ and \if n to a mode k is quantified by the so-called 
influence^ vi^ : 
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The largest values of the influence v„ reveal the 
dihedral angles which contribute the most to the 
fluctuations in this mode. One or more principal 
components PC^ k ' associated to eigenvectors e^ k \ 
i.e. the projection of the trajectory q(t) on the re- 
spective eigenvectors: 
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where (...) denotes the average over the dura- 
tion of the trajectory, were selected as CVs in 
WTmetaD trajectories of the pentapeptide. Con- 
sidering the 2.6 jls unbiased MD simulation the 
four largest eigenvalues of the covariance matrix 
(Ai,A 2 ,A 3 and A 4 ) describe 18%, 16%, 13% and 
11% of the fluctuations with cosine contents^ of 
0.01, 0.006, 0.004 and 0.003 respectively. The 
same dPCA repeated using only the first 26 ns 
or 260 ns (1% and 10% of the simulation respec- 
tively) of the unbiased trajectory gave similar re- 
sults. In particular, the overlap between the sub- 
space generated by the first two eigenvectors (e^ l \ 
g( 2 )) calculated after the first 26 ns or 260 ns of 
the unbiased MD trajectory and the reference sub- 
space generated by the first two eigenvectors of the 
whole 2.6 }is trajectory are 0.77 and 0.96 respec- 
tively. The overlap is calculated as the root-mean- 
square inner product of the first two PCA eigen- 
vectors.— Each PC( k > is associated with a free- 
energy profile (FEP) Vt corresponding to the pro- 
jection of the FEL of the protein along the collec- 
tive coordinate PC^ k \ and computed by using the 



usual Boltzmann formula 



V k = -kT In 
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where P(PC^) is the probability density of the 
variable PC^ computed over the trajectory con- 
sidered, k is the Boltzmann constant and T is the 
temperature. Similarly, we calculated the free- 
energy surface (FES) based on the two first princi- 
pal components as 
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where P(PC^\PC&>) is the two-dimensional 
probability density computed from the trajectory 
considered. 

WTmetaD simulation. The well-tempered 
variant of the metadynamics enhanced sampling 
technique was used for the all-atom biased MD 
simulations of Met-enkephalin using the CVs se- 
lected from the dPCA of the unbiased MD tra- 
jectory. According to the algorithm introduced 
by Barducci et al.— a Gaussian is deposited ev- 
ery Xq = 4ps with height W = W e~ v(s ' f)/(/ ~ 1)r , 
where Wq = 2 kJ/mol is the initial height, T is 
the temperature of the simulation and / = [T + 
AT)/T =1.5 the bias factor with AT a parameter 
with the dimension of a temperature. The resolu- 
tion of the recovered free-energy surface is deter- 
mined by the width of the Gaussians o = 0. 1 in 
units of the respective CV. We performed 2 runs 
of WTmetaD of Met-enkephalin, each of 250 ns, 
using the dPCA eigenvectors generated from both 
the whole 2.6 jls MD trajectory and the shorter tra- 
jectory of 26 ns (1% of the whole unbiased MD 
simulation). 

Comparison of potential energy functions. To 

quantitatively compare the result of the meta- 
dynamics simulation to the reference unbiased 
molecular dynamics, we consider the distance 
measure introduced by Alonso and Echenique 69 
and previously used in Sutto et al.— to compare 
two different energy functions, i.e. V met (s) and 
Vref(s) expressed in terms of the CVs defined in 
a region Q: 



d A (Vmet , V nf ) = ( ( o 2 met + ol f ) ( 1 



met, re f > 



1/2 
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where o x , with x denoting either met or ref, is 
the statistical variance of the free energy V x de- 
fined by o 2 x = £/ Q «fo(V,(j) - (V x )) 2 . (V x ) = 
ji J a dsV x (s) is the average value of V x in the re- 
gion Q. and N = ds is a normalization constant. 
The variance a x sets the physical scale of the mea- 
sure and confers the energy units to the distance. 
r met.ref is the Pearson correlation coefficient and 
measures the degree of correlation between the 
two energy surfaces. It is defined by r met:re f = 
cov(V met ,V re f)/o met o ref where cov(V met ,V re f) is 
the covariance between the two energies. The d\ 
measure is convenient since it is expressed in en- 
ergy units and can be directly compared to the ther- 
mal fluctuations. 

Clustering analysis. To compute the repre- 
sentative structures of Met-enkephalin in a clus- 
ter, i.e. in a local minima in a predefined space, 
a distance measure between structures is defined 
considering C a -RMSD after fitting to an arbitrary 
structure belonging to the local minimum. The 
following procedure is used. 70 First, the RMSD 
(C a -RMSD) between the coordinates of the C a 
atoms of an arbitrary chosen structure in the clus- 
ter and the coordinates of the C a atoms of all the 
other structures in the cluster is computed. Sec- 
ond, the one-dimensional probability distribution 
P(C a — RMSD) associated to the cluster is cal- 
culated and the coordinates of Met-enkephalin of 
all snapshots corresponding to the maximum value 
of P(C a — RMSD) are extracted from the trajec- 
tory. This leads to a set of representative struc- 
tures (most probable) for the MD run considered 
associated to the cluster. Third, the average struc- 
ture of this cluster is computed and the RMSD 
between this average structure and all the mem- 
bers of the set is calculated. The structure within 
the set of representative structures having the min- 
imum RMSD relative to the average structure of 
the set is chosen as the representative structure of 
the cluster. 

3 Results 

Reference free-energy surface. The reference 
FES is obtained from a 2.6 fls unbiased sim- 
ulation, whose convergence with respect to the 
trigonometric functions of the 8 backbone Ra- 



machandran dihedral angles of Met-enkephalin 
was checked by applying the block analysis— to 
the first dihedral principal components, namely 
dPCW(f) with i from 1 to 4 (cf. ©. This pro- 
cedure consists in dividing the trajectory into M 
time segments (or blocks) and in considering a 
full range of block sizes. Small blocks will tend 
to be highly correlated with neighboring blocks, 
whereas blocks longer than important correlation 
times will only be weakly correlated. The true 
statistical uncertainty, i.e. the standar error, is ob- 
tained when the Block Standard Error (BSE), de- 
fined as BSE(n) = -^=, ceases to vary and reaches 
a plateau, reflecting essentialy independent blocks. 
o n represents the standard deviation among the 
block averages and n is the size of the blocks 
(the number of frames in each block). In \T\ are 
shown the BSEs associated with the first four di- 
hedral principal components for the reference 2.6 
jls unbiased trajectory. We see that the first dPC 
(dPCW) is the one which presents a BSE with the 
slowest convergence, reaching the plateau charac- 
terizing the associated true standard error approx- 
imately for a block size of 200-300 ns. In compar- 
ison, the BSEs associated with dPd 2 ) to dPd 4 ) 
reach a plateau for block size of 150 ns and 10 ns 
respectively. Let us note that this is coherent with 
the property of PCA for which the first PC captures 
the slowest mode corresponding to the maximum 
of variability. 

The principal component analysis also gives in- 
formation about the dimension of the space in 
which the conformational substates of the protein 
are distributed. In this spirit, Kitao et al.- pro- 
posed a classification of principal modes into three 
types of modes: multiply-hierarchical, singly- 
hierarchical and harmonic modes. This classi- 
fication is based on the properties of the FEPs 
Vk computed from the whole unbiased MD tra- 
jectory using the Boltzmann formula (cf. [3]). 
Multiply-hierarchical, singly-hierarchical and har- 
monic modes have a FEP with multiple basins of 
minima, a FEP with a single basin of minima, or 
a single minima harmonic FEP, respectively. Con- 
sidering the approach of Hegger et al.— the dimen- 
sion of the FEL is then defined by the number of 
multiply-hierarchical PCs. [2] illustrates the FEPs 
of the first five dPCs characterized as multiply- 
hierarchical (i.e. they contain more than one ma- 
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Figure 1: Block Standard Error associated with 
the first four dihedral principal components for the 
reference 2.6 /is unbiased MD trajectory of Met- 
enkephalin. The functions increase monotonically 
with the block size and converge asymptotically to 
a constant which is the true standard error associ- 
ated with (dPC®). 

jor basin of minima). For more clarity, the FEPs 
are shifted arbitrarily along the ordinate axis. This 
means that dPC^ with i from 1 to 5 are the main 
contributors to the structural fluctuations of the 
pentapeptide, and are associated with global col- 
lective motions of the system, i.e. they contain the 
most large conformational fluctuations. Indeed the 
protein moving along a multiply-hierarchical prin- 
cipal component significantly changes its intra- 
molecular packing topology. The other dPCs, 
namely dPC^ 6 ) to dPC^ 16 ), have either singly hier- 
archical FEPs mainly related to local fluctuations 
of side chains, or harmonic FEPs involving local 
motions of the backbone that do not contribute 
significantly to global conformational changes of 
Met-enkephalin. It emerges from the analysis of [2] 
that information about the first five dPCs is neces- 
sary to discriminate uniquely the conformational 
states of the protein. It is nevertheless instructive 
to analyse the two-dimensional free-energy map 
(dPC^, dPC (2) ) to link the present analysis to 
the one obtained with a Cartesian PCA in Sutto 
et al.,— and to compare our sampled conforma- 
tions to the litterature values. Visual inspection of 
|3]mainly exhibits a quite rugged and complex free- 
energy landscape which contains numerous min- 
ima. This result is different from the smooth ap- 
pearance and the simple single basin of minima 
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Figure 2: Free-energy profiles of the first five dihe- 
dral principal components for the reference 2.6 [is 
unbiased trajectory. The numbers 1 to 5 in the 
panel refer to label of the dPCs. The free-energy 
profiles are shifted arbitrarily along the ordinate 
axis for clarity. 

with a funnel-like shape observed in Sutto et al.— 
where the Cartesian PCA only reveals 3 shallow 
minima. This is however not surprising as the 
same kind of observations have already been stated 
by Mu et al.— in the case of penta-alanine who re- 
lated this difference to the artifact of the unavoid- 
ing mixing of the internal and overall motion of 
the peptide in the Cartesian PCA of its trajectory. 

Clustering analysis of the (dPC (1 ), dPd 2 )) FES 
(see Computational method) represented in[3]high- 
lights three main basins (Ul, U2 and U3) related 
to extreme values of dPC^ 1 ) and dPC^ 2 ) and iso- 
lated from the rest of the FES with relevant bar- 
rier, and correponding to a set of U-shaped confor- 
mations found by Sanbonmatsu and Garcia— and 
by Henin et al.— Let us note that this is differ- 
ent from the Cartesian PCA of Sutto et al.,— who 
found a single basin in their FES calculation corre- 
sponding to the U-shaped conformations of Met- 
enkephalin. As shown in [3l the basin U 1 of the 
(dPC (1 ), dPd 2 )) FES corresponds to a U-shaped 
conformation in which the phenylalanine and ty- 
rosine side chains are packed. A move from the 
Ul basin to the U2 and U3 basins corresponds 
to torsional deformations of the peptide leading to 
the unpack of the phenylalanine and tyrosine side 
chains. In the torsional deformation from Ul to 
C/3, the left and right arms of the U-shaped confor- 
mations become aligned causing the midsection of 
the backbone to untwist. The passage from one U- 
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Figure 3: Clutering analysis of the free-energy surface (cf. S]) as a function of the projection over the 
first two principal eigenvectors computed from the whole unbiased MD trajectory of Met-enkephalin. 
The contour lines are every 0.5 kgT. The representative conformations (most probable) correspond to U- 
shaped backbone with and without packed aromatic rings (Ul, U2 and U3), elongated conformation (E), 
helix-like structure (A) and an intermediate structure (/). N and C indicate the N-term and C-term of the 
peptide, respectively. This figure has been prepared using Pymol ( [http://www.pymol .org) 
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Figure 4: Influence of the first two dihedral-PC, dPO 1 ) (left panel) and dPC^ 2 ) (right panel) defined as 
the contribution of the backbone dihedral angles (panels A) ans (panel B) to a mode k. The largest 
values of the influence (red color) reveal the dihedral angles which contribute the most to the fluctuations 
in this mode. N and C indicate the N-term and C-term of the peptide, respectively. This figure has been 
prepared using Pymol ( |http://www.pymol.org[ ) 



shape basin to another requires crossing the largest 
activation barriers (of about 3kT = 7.5 kJ/mol) 
on the FES shown in [3l Except the basins Ul, 
U2 and U3, the rest of the FES contains a nu- 
merous minima of comparable probabilities and 
lifetime, in particular more elongated structures 
(E) and helix-like conformations (A) also in agree- 
ment with the aforementionned work. 43 The rela- 
tion between the dPC^ and dPC( 2 ) and the Ra- 
machandran dihedral angles along the sequence of 
the peptide is given by the value of their influence 
(cf. Q]) in the eigenvector and , respectively. 
In the mode 1, the influence is the largest for the 
couple of Ramachandran dihedral angles of GLY3 
(highest) and GLY2 (cf. H]). Not surpringly, the 
small size of the H side-chain of the GLY residues 
permit at the middle part of the peptide chain to be 
very flexible. In mode 2, the highest influence is 
for the Ramachandran dihedral angle *P of PHE4, 
followed by the influence of *F of GLY3. The 
system then undergoes simultaneous torsional and 
compressional deformations along the axis defined 
as the two first dPCA eigenvectors and illustrated 
through the related influence in|4j 

To enforce the pertinence of the analysis in 
the two-dimensional essential space to explore 



the conformational space, we link the clustering 
analysis shown in [3] with the study of the one- 
dimensional FEPs of the 8 Ramachandran dihedral 
angles represented in [51 A similar approach was 
recently used to study the dynamics of side-chains 
of a protein in its native state.— Indeed, by map- 
ping the different points of the two-dimensional 
essential free-energy surface on the set of the free- 
energy profiles along the amino-acid sequence of 
the peptide, we obtain a representation of the con- 
figurational changes in a complete basis, i.e. al- 
lowing the reconstruction of the backbone of the 
pentapeptide. It is relevant to notice that the rep- 
resentative (most probable) structures associated 
to the three main basins U\,U2 and U3 occupy 
the global minima in the one-dimensional FEPs of 
the 8 Ramachandran dihedral angles or metastable 
states very close to these global minima as for U 1 , 
U2 for ^2 and for Ul for ¥4 (cf. [5J. Moreover, 
if we consider the set of structures belonging to 
these three basins U\,U2 and U3 (cf. [3]), we see 
that other local minima are filled (denoted Ul*, 
U2* and U3* in[5J . Indeed, if we do not restrict 
the present analysis to the basins Ul, U2 and U3, 
we notice that the subset of structures in the basins 
U1,U2,U3,A, E, I, extracted from the clustering 
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Figure 5: Representaion of the eight one-dimensional free-energy profiles of the Ramachandran dihedral 
angles for the reference 2. 6jis unbiased simulation. U\, U2, U3, A, E, and / denote the values of the 
Ramachandran dihedral angles of the corresponding representative (most probable) structures shown in[3] 
UV, U2*, U3*,A*, E* and /* denote other structures in the basins Ul, U2, U3,A, E and / within 0.5k B T 
from the global minima of the corresponding basins in[3l 
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Figure 6: Free-energy surfaces for different times as a function of the root-mean- square deviation (Rmsd) 
and the gyration radius (Rgyr) for the reference unbiased simulation (top panel) and the metadynamics 
simulations using the two first dihedral principal components generated from the 2.6 /is unbiased MD 
trajectory (middle panel) and the 26 ns unbiased MD trajectory (bottom panel) as a bias. The contour 
lines are drawn every 0.5 kgT 
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analysis describe all the minima extracted from the 
complete basis of the 8 free-energy profiles shown 
in \5\ More precisely, we obtain the existence of 
these minima but not the uniqueness of the combi- 
nation of the 8 minima. The lack of uniqueness 
could have been expected as we show in [2] that 
the first 5 dPCs were usefull to describe uniquely 
all the different conformations of Met-enkephalin. 
The relevant point is however that the first 2 dPCs 
gives a couple of collective variables sufficient to 
explore all the minima, that represents a criterion 
to limit the numbers of CVs necessary to biased 
the metadynamics simulation. 

Metadynamics reconstruction. The metady- 
namics runs are performed using the first two 
dPCA eigenvectors generated from both the whole 
2.6 lis MD trajectory and a shorter MD trajectory 
of 26 ns corresponding to 1% of the whole unbi- 
ased MD trajectory (see Computational method). 
These collective variables take advantage of the es- 
sential dynamics technique and have been shown 
to be much more efficient than Cartesian PCA 
to describe the complexity of the conformational 
space during protein folding.— ^ 

The ability of these CVs to exhaustively explore 
the conformational space is reflected by the accu- 
racy of the reconstructed FES projected along two 
different and global observables commonly used 
to characterize structural properties of the over- 
all conformations: the root-mean- square deviation 
from the initial structure (Rmsd) and the radius of 
gyration (Rgyr). We see in [6] that, to the differ- 
ence with the unbiased simulation, the position of 
the minimum and the overall topology of the free- 
energy landscape of the metadynamics run per- 
formed using the dPCA eigenvectors calculated 
over the whole unbiased MD trajectory (middle 
panel in [6]) similarly agree with those of the ref- 
erence unbiased simulation (upper right pannel in 
|6]) after the first 100 ns. The same visual analy- 
sis considering the metadynamics run performed 
using the dPCA eigenvectors calculated over the 
26 ns unbiased MD trajectory (bottom panel in [6]), 
clearly highlights a slower exploration of the con- 
formational space of Met-enkephalin in this case, 
as all the local minima are not explored after the 
first 100 ns in comparison with the reference un- 
biased simulation (upper right pannel in[6l see the 
region at (Rmsd, Rgyr) « (0.35, 0.42)). However, 



this difference could have been expected from the 
overlap analysis (see Computational method) as 
the calculated overlap between the dPCA eigen- 
vectors calculated over the 26 ns unbiased sim- 
ulation and over the whole 2.6 }ls simulation is 
0.77. Indeed, the metadynamics algorithm disfa- 
vors the system to explore already visited regions 
along the dPCA eigenvectors which do not corre- 
spond to the optimal principal directions of fluc- 
tuations if computed over a MD trajectory of 26 
ns of duration. In order to asses quantitatively the 




50 100 150 200 

time [ns] 



Figure 7: Comparison of the convergence of the 
two-dimensional FES(Rmsd,Rgyr) for the unbi- 
ased and metadynamics simulations as a func- 
tion of time. The similarities are calculated us- 
ing as reference the 2.6 fls unbiased simulation 
FES shown in[6l The energy-function distance in- 
troduced by Alonso and Echenique is considered. 
The result is presented in units of kgT and the 
dashed line at 1 kgT defines the goal accuracy. 

convergence of the free-energy surface in the two- 
dimensional (Rmsd/Rgyr) representation, we use 
the correlation coefficient introduced by Alonso 
and Echenique,— which allows quantitative mea- 
surement of the similarity between different en- 
ergy potentials (see Computational method). In 
[7] we report these coefficients for the eigenvector 
metadynamics compared to the unbiased MD run 
as a function of time. The unbiased simulation 
reaches the 1 kgT reference threshold after 250 ns, 
which is coherent with the convergence study ex- 
tracted from the block analysis shown in \T\ On 
the other hand, the metadynamics runs (performed 
with the dPCA of the 26 ns or the 2.6/l.s unbi- 
ased MD trajectory) converge faster than the un- 
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Figure 8: Representation of the eight one-dimensional free-energy profiles of the backbone angles com- 
puted from the reference 2.6/l.s unbiased simulation and from the first 150 ns (<1a < 1 kgT) of the metady- 
namics run performed with the dPCA eigenvectors of the 26 ns unbiased MD trajectory. 
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Figure 9: Comparison of the reconstruction of the FEPs associated to the Ramachandran dihedral angle 
$4 for biased and unbiased simulation. The reference 2.6 /is unbiased simulation is represented for com- 
parison. In the left panel, we compare the WTmetaD reconstruction after 50 ns (cIa > 1 kgT) and 150 ns 
(cIa < 1 kgT). In the right panel, the WTmetaD reconstruction at 100 ns (cIa < 1 kgT) is compared to the 
unbiased one (aU > 1 kgT) at the same time. In both cases, we see that the biased or unbiased simulation 
associated to the Alonso and Echenique distance cIa > 1 kgT completely missed the local minimum on 
this timescale. The FEPs are shifted arbitrarily along the ordinate axis for clarity. 



biased one, reaching and staying below the refer- 
ence threshold after less than or approximately at 
100 ns. The convergence analysis represented in 
|7] emphasizes the slower exploration of the confor- 
mational space already stated in the previous vi- 
sual inspection. Indeed, the energy-function dis- 
tance of Alonso and Echenique (cf. [7]) associated 
to the WTmetaD performed with the dPCA of the 
26 ns unbiased MD trajectory reaches the refer- 
ence threshold around 120 ns, i.e. sensibly after 
the one performed with the dPCA of the 2.6 fls un- 
biased MD trajectory (around 90 ns). 

Let us now return to the one-dimensional FEPs 
of the 8 Ramachandran dihedral angles to link the 
conformational space explored by the metadynam- 
ics runs with the convergence analysis based on 
the distance measure of Alonso and Echenique re- 
ported in [7J In [8] it is shown the comparison be- 
tween the one-dimensional FEPs of the dihedral 
angles for the reference 2.6 /is unbiased MD tra- 
jectory and the metadynamics run (performed us- 
ing the dPCA eigenvectors calculated over the 26 
ns unbiased MD trajectory) after 150 ns (cIa < 
1 kgT). We see that the biased MD simulation cap- 
tures all the different wells with very good accu- 
racy for the global minima. We show for compari- 
son in |9] (left panel) that the biased MD simulation 
at 50 ns (cIa > IksT) completely missed the lo- 
cal minimum associated to the Ramachandran di- 



hedral angle $4 at $4 f« 1 rad. This illustrates 
that the convergence criterion of the FES in the 
two-dimensional (Rmsd/Rgyr) representation can 
also be reformulated in terms of the capture of the 
global and local minima in the one-dimensional 
FEPs of the 8 Ramachandran dihedral angles. This 
representation is also interesting to explain the de- 
lay of convergence of the unbiased simulation in 
comparison with the WTmetaD simulation. In [9] 
(right panel), it is shown the one-dimensional FEP 
associated to the backbone angle $4 after 100 ns 
for both the unbiased MD simulation (cIa > 1 kgT) 
and the WTmetaD simulations (cIa < 1 kgT) per- 
formed using the dPCA eigenvectors calculated 
over the 2.6 /is unbiased MD trajectory. We see 
that in both cases, the well associated to the global 
minimum has been accuratly explored. However 
the unbiased simulation completely missed the lo- 
cal minimum associated to the Ramachandran di- 
hedral angle 4>4 at 4>4 f« 1 rad. 

4 Conclusion and Discussion 

In this paper, we considered a Well-Tempered 
metadynamics approach coupled with biased col- 
lective variables generated from a dihedral prin- 
cipal component analysis to reconstruct the con- 
formational free-energy landscape of a small and 
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very diffusive pentapeptide. We compared the ac- 
curacy and computational efficiency of WTmetaD 
with dPCA with a long unbiased MD simulation 
as well as with the previous study of Sutto et 
al.— that considered WTmetaD using biased col- 
lective variables generated from Cartesian PCA. 
We reconstructed the expected free-energy surface 
accurately and quicker than unbiased MD simu- 
lation, which exhibits a quite rugged and com- 
plex appearance different from the single-minima 
funnel-like shape observed with a Cartesian PCA. 
Indeed the dPCA differs from Cartesian PCA by 
the fact it takes advantage of appropriate internal 
coordinates, i.e. the Ramachandran dihedral an- 
gles describing the backbone structure of the pro- 
tein. This approach becomes particularly usefull 
when one considers complex systems as multi- 
domain proteins— for which we observe diffusion 
motions of a domain around another one, or in- 
trinsically disordered proteins - characterized by 
a lack of stable tertiary structure due to high struc- 
tural flexibility of the protein. In these cases, it is 
particularly impossible to unambiguously define a 
single reference structure associated to the Carte- 
sian PCA procedure and necessary to eliminate the 
overall rotations of the molecules. Moreover, we 
underline a new criterion to limit the number of 
collective variables necessary to biased the meta- 
dynamics simulation. We consider the projection 
of the CVs on the complete basis represented by 
the one-dimensional FEPs of the 8 Ramachandran 
dihedral angles. The choice of the two first dPCs 
as CVs for the WTmetaD is then related to the 
capture of the different wells when one considers 
structures extracted from the clustering analysis in 
the free-energy map (dPC^, dPC^ 2 )). This crite- 
rion is different from the one of Sutto et al.,— who 
justified their limitation to the two first PCA eigen- 
vectors, showing that increasing this number con- 
siderably slows the convergence of the simulation, 
performing worse than the unbiased run. 

Let us now discuss the efficiency of the method 
in the particular case of the Met-enkephalin pen- 
tapeptide. Indeed, the dPCA over the 26 ns, cor- 
responding to 1 % of the whole unbiased MD sim- 
ulation, must be linked to the convergence analy- 
sis of the unbiased MD simulation based either on 
the distance measure of Alonso and Echenique (cf. 
[7]), or the block analysis (cf. [[])• We notice actu- 



ally that the unbiased MD simulation converged at 
approximately 250 ns, which is an order of mag- 
nitude smaller than the reference 2.6 jls unbiased 
MD trajectory. Thus the dPCA over the 26 ns only 
corresponds to 10% of the unbiased MD trajectory 
containing information about the dynamics of the 
system. Nevertheless, we reconstructed the FES 
associated to the Met-enkephalin pentapeptide in 
approximately 100 ns which is twice faster than 
the unbiased MD simulation. This relative modest 
succes of WTmetaD to accelerate the exploration 
of the conformational space could have been ex- 
pected. Indeed, the small height of the free-energy 
barriers and diffusive behavior of the dynamics 
of Met-enkephalin is known to pose an additional 
challenge to free-energy-based methods like WT- 
metaD that generally perform better in the pres- 
ence of high free-energy barriers and ballistic dy- 
namics. Let us note that the same modest succes 
of WTmetaD also emerged from the study of Sutto 
et al.— with the use of Cartesian PCA. It is thus in- 
teresting to extend this method over more complex 
systems that present more favorable characteris- 
tics to apply WTmetaD. This work in progress sets 
out to demonstrate this approach on different well- 
know test proteins and more complex systems as 
multi-domain proteins that represent nowadays a 
real challenge for the biophysics community. 
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